<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE topic
  PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd">
<topic id="topic1">
  <title id="pg137116">analyzedb</title>
  <body>
    <p>A utility that performs <codeph>ANALYZE</codeph> operations on tables incrementally and
      concurrently. <ph>For append optimized tables, <cmdname>analyzedb</cmdname> updates statistics
        only if the statistics are not current.</ph></p>
    <section>
      <title>Synopsis</title>
      <codeblock>analyzedb -d &lt;dbname>
   { -s &lt;schema>  | 
   { -t &lt;schema>.&lt;table> 
     [ -i &lt;col1>[,&lt;col2>, ...] | 
       -x &lt;col1>[,&lt;col2>, ...] ] } |
     { -f | --file} &lt;config-file> }
   [ -l | --list ]
   [ --gen_profile_only ]   
   [ -p &lt;parallel-level> ]
   [ --full ]
   [ --skip_root_stats ]
   [ --skip_orca_root_stats ]
   [ -v | --verbose ]
   [ -a ]

analyzedb { --clean_last | --clean_all }
analyzedb --version
analyzedb { -? | -h | --help }</codeblock>
    </section>
    <section>
      <title>Description</title>
      <p>The <cmdname>analyzedb</cmdname> utility updates statistics on table data for the specified
        tables in a Greenplum database incrementally and concurrently. </p>
      <p>While performing <xref href="../../ref_guide/sql_commands/ANALYZE.xml">ANALYZE</xref>
        operations, <cmdname>analyzedb</cmdname> creates a snapshot of the table metadata and stores
        it on disk on the master host. An <codeph>ANALYZE</codeph> operation is performed only
        if the table has been modified. If a table or partition has not been modified since the last
        time it was analyzed, <cmdname>analyzedb</cmdname> automatically skips the table or
        partition because it already contains up-to-date statistics.</p>
      <ul id="ul_vgw_25c_3r">
        <li>For append optimized tables, <cmdname>analyzedb</cmdname> updates statistics
          incrementally, if the statistics are not current. For example, if table data is changed
          after statistics were collected for the table. If there are no statistics for the table,
          statistics are collected. </li>
        <li>For heap tables, statistics are always updated.</li>
      </ul>
      <p>Specify the <codeph>--full</codeph> option to update append-optimized table statistics even
        if the table statistics are current.</p>
      <p dir="ltr">By default, <cmdname>analyzedb</cmdname> creates a maximum of 5 concurrent
        sessions to analyze tables in parallel. For each session, <cmdname>analyzedb</cmdname>
        issues an <cmdname>ANALYZE</cmdname> command to the database and specifies different table
        names. The <codeph>-p</codeph> option controls the maximum number of concurrent sessions. </p>
      <sectiondiv>
        <b>Partitioned Append-Optimized Tables</b>
        <p>For a partitioned, append-optimized table, <cmdname>analyzedb</cmdname> checks the
          partitioned table root partition and leaf partitions. If needed, the utility updates
          statistics for non-current partitions and the root partition. For information about how
          statistics are collected for partitioned tables, see <codeph><xref
              href="../../ref_guide/sql_commands/ANALYZE.xml"
        >ANALYZE</xref></codeph>.</p>
        <p><codeph>analyzedb</codeph> must sample additional partitions within a partitioned
          table when it encounters a stale partition, even when statistics are already
          collected. Consider it a best practice to run <codeph>analyzedb</codeph> on the
          root partition any time that you add a new partition(s) to a partitioned table.
          This operation both analyzes the child leaf partitions in parallel and merges
          any updated statistics into the root partition.</p>
    </sectiondiv>
    </section>
    <section><title>Notes</title><p>The <cmdname>analyzedb</cmdname> utility updates append
        optimized table statistics if the table has been modified by DML or DDL commands, including
          <cmdname>INSERT</cmdname>, <cmdname>DELETE</cmdname>, <cmdname>UPDATE</cmdname>,
          <cmdname>CREATE TABLE</cmdname>, <cmdname>ALTER TABLE</cmdname> and
          <cmdname>TRUNCATE</cmdname>. The utility determines if a table has been modified by
        comparing catalog metadata of tables with the previous snapshot of metadata taken during a
        previous <cmdname>analyzedb</cmdname> operation. The snapshots of table metadata are stored
        as state files in the directory <codeph>db_analyze/&lt;db_name>/&lt;timestamp></codeph> in
        the Greenplum Database master data directory. </p>The utility preserves old snapshot
      information from the past 8 days, and the 3 most recent state directories regardless of age,
      while all other directories are automatically removed. You can also specify the
        <codeph>--clean_last</codeph> or <codeph>--clean_all</codeph> option to remove state files
      generated by <cmdname>analyzedb</cmdname>. <p>If you do not specify a table, set of tables, or
        schema, the <cmdname>analyzedb</cmdname> utility collects the statistics as needed on all
        system catalog tables and user-defined tables in the database. </p><p>External tables are
        not affected by <cmdname>analyzedb</cmdname>.</p><p>Table names that contain spaces are not
        supported.</p><p>Running the <codeph>ANALYZE</codeph> command on a table, not using the
          <codeph>analyzedb</codeph> utility, does not update the table metadata that the
          <codeph>analyzedb</codeph> utility uses to determine whether table statistics are up to
        date. </p></section>
    <section>
      <title>Options</title>
      <parml>
        <plentry>
          <pt>--clean_last</pt>
          <pd>Remove the state files generated by last <cmdname>analyzedb</cmdname> operation. All
            other options except <cmdname>-d</cmdname> are ignored.</pd>
        </plentry>
        <plentry>
          <pt>--clean_all</pt>
          <pd>Remove all the state files generated by <cmdname>analyzedb</cmdname>. All other
            options except <cmdname>-d</cmdname> are ignored.</pd>
        </plentry>
        <plentry>
          <pt>-d <varname>dbname</varname></pt>
          <pd>Specifies the name of the database that contains the tables to be analyzed. If this
            option is not specified, the database name is read from the environment variable
              <codeph>PGDATABASE</codeph>. If <codeph>PGDATABASE</codeph> is not set, the user name
            specified for the connection is used.</pd>
        </plentry>
        <plentry>
          <pt>-f <varname>config-file</varname> | --file <varname>config-file</varname></pt>
          <pd>Text file that contains a list of tables to be analyzed. A relative file path from
            current directory can be specified.</pd>
          <pd>The file lists one table per line. Table names must be qualified with a schema name.
            Optionally, a list of columns can be specified using the <cmdname>-i</cmdname> or
              <cmdname>-x</cmdname>. No other options are allowed in the file. Other options such as
              <codeph>--full</codeph> must be specified on the command line.</pd>
          <pd>Only one of the options can be used to specify the files to be analyzed:
              <codeph>-f</codeph> or <codeph>--file</codeph>, <codeph>-t</codeph> , or
              <codeph>-s</codeph>.</pd>
          <pd>When performing <cmdname>ANALYZE</cmdname> operations on multiple tables,
              <cmdname>analyzedb</cmdname> creates concurrent sessions to analyze tables in
            parallel. The <codeph>-p</codeph> option controls the maximum number of concurrent
            sessions. </pd>
          <pd>In the following example, the first line performs an <cmdname>ANALYZE</cmdname>
            operation on the table <codeph>public.nation</codeph>, the second line performs an
              <cmdname>ANALYZE</cmdname> operation only on the columns <codeph>l_shipdate</codeph>
            and <codeph>l_receiptdate</codeph> in the table
            <codeph>public.lineitem</codeph>.<codeblock>public.nation
public.lineitem -i l_shipdate,l_receiptdate </codeblock></pd>
        </plentry>
        <plentry>
          <pt>--full</pt>
          <pd>Perform an <cmdname>ANALYZE</cmdname> operation on all the specified tables. The
            operation is performed even if the statistics are up to date.</pd>
        </plentry>
        <plentry>
          <pt>--gen_profile_only</pt>
          <pd>Update the <codeph>analyzedb</codeph> snapshot of table statistics information without
            performing any <codeph>ANALYZE</codeph> operations. If other options specify tables or a
            schema, the utility updates the snapshot information only for the specified tables.</pd>
          <pd>Specify this option if the <codeph>ANALYZE</codeph> command was run on database tables
            and you want to update the <codeph>analyzedb</codeph> snapshot for the tables. </pd>
        </plentry>
        <plentry>
          <pt>-i <varname>col1</varname>,<varname>col2</varname>, ...</pt>
          <pd>Optional. Must be specified with the <cmdname>-t</cmdname> option. For the table
            specified with the <cmdname>-t</cmdname> option, collect statistics only for the
            specified columns. </pd>
          <pd>Only <cmdname>-i</cmdname>, or <cmdname>-x</cmdname> can be specified. Both options
            cannot be specified.</pd>
        </plentry>
        <plentry>
          <pt>-l | --list</pt>
          <pd>Lists the tables that would have been analyzed with the specified options. The
              <cmdname>ANALYZE</cmdname> operations are not performed.</pd>
        </plentry>
        <plentry>
          <pt>-p <varname>parallel-level</varname></pt>
          <pd>The number of tables that are analyzed in parallel. <varname>parallel level</varname>
            can be an integer between 1 and 10, inclusive. Default value is 5. </pd>
        </plentry>
        <plentry>
          <pt>--skip_root_stats</pt>
          <pd>This option is no longer used, you may remove it from your scripts.</pd>
        </plentry>
        <plentry>
          <pt><codeph>--skip_orca_root_stats</codeph></pt>
          <pd>
            <note>Do not use this option if GPORCA is enabled.</note>
          </pd>
          <pd>Use this option if you find that <codeph>ANALYZE ROOTPARTITION</codeph> commands take
            a very long time to complete.</pd>
          <pd>
            <note type="warning">After you run <codeph>analyzedb</codeph> with this option,
              subsequent <codeph>analyzedb</codeph> executions will not update root partition
              statistics except when changes have been made to the table.</note>
          </pd>
        </plentry>
        <plentry>
          <pt>-s <varname>schema</varname></pt>
          <pd>Specify a schema to analyze. All tables in the schema will be analyzed. Only a single
            schema name can be specified on the command line. </pd>
          <pd>Only one of the options can be used to specify the files to be analyzed:
              <codeph>-f</codeph> or <codeph>--file</codeph>, <codeph>-t</codeph> , or
              <codeph>-s</codeph>.</pd>
        </plentry>
        <plentry>
          <pt>-t <varname>schema</varname>.<varname>table</varname></pt>
          <pd> Collect statistics only on <varname>schema</varname>.<varname>table</varname>. The
            table name must be qualified with a schema name. Only a single table name can be
            specified on the command line. You can specify the <codeph>-f</codeph> option to specify
            multiple tables in a file or the <codeph>-s</codeph> option to specify all the tables in
            a schema.</pd>
          <pd>Only one of these options can be used to specify the files to be analyzed:
              <codeph>-f</codeph> or <codeph>--file</codeph>, <codeph>-t</codeph> , or
              <codeph>-s</codeph>.</pd>
        </plentry>
        <plentry>
          <pt>-x <varname>col1</varname>,<varname>col2</varname>, ...</pt>
          <pd>Optional. Must be specified with the <cmdname>-t</cmdname> option. For the table
            specified with the <cmdname>-t</cmdname> option, exclude statistics collection for the
            specified columns. Statistics are collected only on the columns that are not
            listed.</pd>
          <pd>Only <cmdname>-i</cmdname>, or <cmdname>-x</cmdname> can be specified. Both options
            cannot be specified.</pd>
        </plentry>
        <plentry>
          <pt>-a</pt>
          <pd> Quiet mode. Do not prompt for user confirmation. </pd>
        </plentry>
        <plentry>
          <pt>-h | -? | --help </pt>
          <pd> Displays the online help.</pd>
        </plentry>
        <plentry>
          <pt> -v | --verbose</pt>
          <pd>If specified, sets the logging level to verbose to write additional information the
            log file and to the command line during command execution. The information includes a
            list of all the tables to be analyzed (including child leaf partitions of partitioned
            tables). Output also includes the duration of each <codeph>ANALYZE</codeph>
            operation.</pd>
        </plentry>
        <plentry>
          <pt>--version</pt>
          <pd>Displays the version of this utility.</pd>
        </plentry>
      </parml>
    </section>
    <section>
      <title>Examples</title>
      <p>An example that collects statistics only on a set of table columns. In the database
          <codeph>mytest</codeph>, collect statistics on the columns <codeph>shipdate</codeph> and
          <codeph>receiptdate</codeph> in the table <codeph>public.orders</codeph>:</p>
      <codeblock>analyzedb -d mytest -t public.orders -i shipdate,receiptdate</codeblock>
      <p>An example that collects statistics on a table and exclude a set of columns. In the
        database <codeph>mytest</codeph>, collect statistics on the table
          <codeph>public.foo</codeph>, and do not collect statistics on the columns
          <codeph>bar</codeph> and <codeph>test2</codeph>.</p>
      <codeblock>analyzedb -d mytest -t public.foo -x bar,test2</codeblock>
      <p>An example that specifies a file that contains a list of tables. This command collect
        statistics on the tables listed in the file <codeph>analyze-tables</codeph> in the database
        named <codeph>mytest</codeph>.</p>
      <codeblock>analyzedb -d mytest -f analyze-tables</codeblock>
      <p>If you do not specify a table, set of tables, or schema, the <cmdname>analyzedb</cmdname>
        utility collects the statistics as needed on all catalog tables and user-defined tables in
        the specified database. This command refreshes table statistics on the system catalog tables
        and user-defined tables in the database <codeph>mytest</codeph>.</p>
      <codeblock>analyzedb -d mytest</codeblock>
      <p>You can create a PL/Python function to run the <codeph>analyzedb</codeph> utility as a
        Greenplum Database function. This example <codeph>CREATE FUNCTION</codeph> command creates a
        user defined PL/Python function that runs the <codeph>analyzedb</codeph> utility and
        displays output on the command line. Specify <codeph>analyzedb</codeph> options as the
        function parameter.
        <codeblock>CREATE OR REPLACE FUNCTION analyzedb(params TEXT)
  RETURNS VOID AS
$BODY$
    import subprocess
    cmd = ['analyzedb', '-a' ] + params.split()
    p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

    # verbose output of process
    for line in iter(p.stdout.readline, ''):
        plpy.info(line);

    p.wait()
$BODY$
LANGUAGE plpythonu VOLATILE;</codeblock></p>
      <p>When this <codeph>SELECT</codeph> command is run by the gpadmin user, the
          <codeph>analyzedb</codeph> utility performs an analyze operation on the table
          <codeph>public.mytable</codeph> that is in the database <codeph>mytest</codeph>.</p>
      <codeblock>SELECT analyzedb('-d mytest -t public.mytable') ;</codeblock>
      <note>To create a PL/Python function, the PL/Python procedural language must be registered as
        a language in the database. For example, this <codeph>CREATE LANGUAGE</codeph> command run
        as gpadmin registers PL/Python as an untrusted
        language:<codeblock>CREATE LANGUAGE plpythonu;</codeblock></note>
    </section>
    <section>
      <title>See Also</title>
      <p>
        <ph otherprops="op-print"><cmdname>ANALYZE</cmdname> in the <cite>Greenplum Database
            Reference Guide</cite></ph>
        <ph>
          <xref href="../../ref_guide/sql_commands/ANALYZE.xml#topic1"/>
        </ph>
      </p>
    </section>
  </body>
</topic>
